training run
- North America > United States (0.04)
- Europe > Italy > Sicily (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.46)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > Maryland > Baltimore County (0.04)
- (3 more...)
- Law (0.68)
- Information Technology > Security & Privacy (0.46)
A Appendix
Perplexity vs. FLOP count of MIM compared to left-to-right baselines across model sizes. To evaluate the effectiveness of "Meet in the Middle" (MIM) pre-training compared to left-to-right Perplexity vs. training time of MIM compared to left-to-right baselines across model sizes. Our largest models of size 2.7B parameters are trained using 128 A100 GPU with 80GB See Table 10 for the details of all the training runs. This paper presents "Meet in the Middle", a novel pretraining paradigm for language models that The proposed method's secondary benefits in the infilling task could also improve several NLP tasks, such as text summarization and question answering, leading to better usability and overall
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning
Deep neural networks have seen great success in recent years; however, training a deep model is often challenging as its performance heavily depends on the hyper-parameters used. In addition, finding the optimal hyper-parameter configuration, even with state-of-the-art (SOTA) hyper-parameter optimization (HPO) algorithms, can be time-consuming, requiring multiple training runs over the entire datasetfor different possible sets of hyper-parameters. Our central insight is that using an informative subset of the dataset for model training runs involved in hyper-parameter optimization, allows us to find the optimal hyper-parameter configuration significantly faster. In this work, we propose AUTOMATA, a gradient-based subset selection framework for hyper-parameter tuning. We empirically evaluate the effectiveness of AUTOMATA in hyper-parameter tuning through several experiments on real-world datasets in the text, vision, and tabular domains. Our experiments show that using gradient-based data subsets for hyper-parameter tuning achieves significantly faster turnaround times and speedups of 3 -30 while achieving comparable performance to the hyper-parameters found using the entire dataset.
Analog Physical Systems Can Exhibit Double Descent
Dillavou, Sam, Rocks, Jason W, Wycoff, Jacob F, Liu, Andrea J, Durian, Douglas J
An important component of the success of large AI models is double descent, in which networks avoid overfitting as they grow relative to the amount of training data, instead improving their performance on unseen data. Here we demonstrate double descent in a decentralized analog network of self-adjusting resistive elements. This system trains itself and performs tasks without a digital processor, offering potential gains in energy efficiency and speed -- but must endure component non-idealities. We find that standard training fails to yield double descent, but a modified protocol that accommodates this inherent imperfection succeeds. Our findings show that analog physical systems, if appropriately trained, can exhibit behaviors underlying the success of digital AI. Further, they suggest that biological systems might similarly benefit from over-parameterization.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Russia (0.04)
- Europe > Germany (0.04)
- (11 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
- Banking & Finance (0.67)
- Government (0.46)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Tuscany > Florence (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)